Part 1 of the Pearson Live Training Session “Hands–On Data Visualization with ggplot2” for O’Reilly
{ggplot2} Package
{ggplot2}is a system for declaratively creating graphics,
based on “The Grammar of Graphics” (Wilkinson, 2005). You provide the data, tell{ggplot2}how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
{ggplot2}hands-on-ggplot2.Rproj01-grammar.Rmd
ggplot2 is a data visualization package for the programming language R created by Hadley Wickham.
It should be already installed on your system (if not run the first line in the following chunk). The functionality of the package can be loaded by calling library() as for any other package:
ggplot2 is part of the tidyverse package collection. Thus, you can also load tidyverse without running library(ggplot2):
We use cryptocurrency financial data, pulled from CoinMarketCap.com. For our purposes, we limit the data to the period 08/2017–12/2019 and the top 4 cryptocurrencies.
I have already prepared the data. If you want to know how, you can have a look here.
Using the read_csv() function form the {readr} package, we can read the data directly from the web:
url <- "https://raw.githubusercontent.com/z3tt/hands-on-ggplot2/main/data/crypto_cleaned.csv"
data <- readr::read_csv(url)
data
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
Of course, one can import local files as well:
data_local <- readr::read_csv("data/crypto_cleaned.csv")
data_local
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
This assumes that you have placed the file in a folder called data in your working directory.
You can specify this directory via setwd() or, and preferably, use R projects.
The so–called namespace allows to access functions from a package directly without loading it first.
packagename::function(argument)
Furthermore, it helps readers to understand from which package a function is imported from.
We need to specify the data in the ggplot() call:
ggplot(data = data)
There is only an empty panel because ggplot2 doesn’t know what of the data it should plot.
We need to specify two variables we want to plot as positional aesthetics:
There is only an empty panel because ggplot2 doesn’t know how it should plot the data.
Thanks to implicit matching of arguments in ggplot() and aes(), we can also write:
By adding one or multiple layers we can tell ggplot2 how to represent the data. There are lots of built-in geometric elements (geom's) and statistical transformations (stat's):
We can tell ggplot2 to represent the data for example as a scatter plot:
ggplot(data, aes(date, close)) +
geom_point()
Aesthetics do not only refer to x and y positions, but also groupings, colors, fills, shapes etc.
ggplot(data = data, mapping = aes(x = date, y = close, color = currency)) +
geom_point()
You can replace the default theme with one of the other built-in themes with theme_set(). Note that you can as well adjust some global settings, for example the base_size which is often too small in the default (11).
theme_set(theme_light(base_size = 18))
By using theme_set() the new theme is used for any plot you create aftwerwards! Give it a try on go back to the last chunk and re-run the code to generate the colored scatter plot.
The exciting thing about layers is that you can combine several geom_*() and stat_*() calls:
ggplot(data, aes(date, close, color = currency)) +
geom_line() +
geom_point()
… and aesthetics can be applied either globally:
ggplot(data, aes(date, close, color = currency, shape = currency)) +
geom_line() +
geom_point()
… or for each layer individually:
ggplot(data, aes(date, close)) +
geom_line(aes(color = currency)) +
geom_point(aes(shape = currency))
chic <- readr::read_csv(
"https://raw.githubusercontent.com/z3tt/ggplot-courses/master/data/chicago-nmmaps.csv"
)
temp) versus day (date).season).year).
ggplotYou can export your plot via the ggsave() function:
-> Scales, Coordinate Systems, Facets, Themes, and Annotations will follow later
“ggplot2: Elegant Graphics for Data Analysis”, free–access book by Hadley Wickham et al.
“R for Data Science”, free–access book by Hadley Wickham
“Data Visualization: A Practical Introduction”, free–access book by Kieran Healy
“A {ggplot2} Tutorial for Beautiful Plotting in R”, my extensive “how to”-tutorial
{here} PackageA good workflow when working with local files is offered by the {here} package in combination with R projects:
here::here()
[1] "C:/Users/Freya Watkins/OneDrive - University of Birmingham/Desktop/R/hands-on-ggplot2-training"
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
The base R function read.csv() works in the same way as readr::read_csv():
currency date open high low close year month yday
1 binance-coin 2019-12-04 15.35 15.69 15.01 15.28 2019 12 338
2 binance-coin 2019-12-03 15.19 15.55 15.05 15.31 2019 12 337
3 binance-coin 2019-12-02 15.51 15.71 15.15 15.19 2019 12 336
4 binance-coin 2019-12-01 15.74 15.74 15.05 15.50 2019 12 335
5 binance-coin 2019-11-30 16.26 16.37 15.54 15.72 2019 11 334
6 binance-coin 2019-11-29 15.68 16.34 15.65 16.27 2019 11 333
… and we can turn it into a tibble afterwards:
data <- tibble::as_tibble(data)
data
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <int>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
However, note that by default the date column is runed into type character.
The import() function from the {rio} package allows to load all kind of data formats:
#install.packages("rio")
data <- rio::import(here::here("data", "crypto_cleaned.csv"))
head(data) ## use just head because the output is very long
currency date open high low close year month yday
1 binance-coin 2019-12-04 15.35 15.69 15.01 15.28 2019 12 338
2 binance-coin 2019-12-03 15.19 15.55 15.05 15.31 2019 12 337
3 binance-coin 2019-12-02 15.51 15.71 15.15 15.19 2019 12 336
4 binance-coin 2019-12-01 15.74 15.74 15.05 15.50 2019 12 335
5 binance-coin 2019-11-30 16.26 16.37 15.54 15.72 2019 11 334
6 binance-coin 2019-11-29 15.68 16.34 15.65 16.27 2019 11 333
We can turn it into a tibble afterwards—or specify it directly when importing the data set:
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <date> <dbl> <dbl> <dbl> <dbl> <int> <int> <int>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
You could also load, for example, JSON or Excel files with the same function:
#```{r data-rio-json} data_json <- rio::import(here::here(“data”, “crypto_cleaned.json”))
data_json <- as_tibble(data_json) ## somehow setclass doesn’t work with json
data_json
<div class="layout-chunk" data-layout="l-body">
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class='va'>data_xlsx</span> <span class='op'><-</span> <span class='fu'>rio</span><span class='fu'>::</span><span class='fu'><a href='https://rdrr.io/pkg/rio/man/import.html'>import</a></span><span class='op'>(</span>
<span class='fu'>here</span><span class='fu'>::</span><span class='fu'><a href='https://here.r-lib.org//reference/here.html'>here</a></span><span class='op'>(</span><span class='st'>"data"</span>, <span class='st'>"crypto_cleaned.xlsx"</span><span class='op'>)</span>,
setclass <span class='op'>=</span> <span class='st'>"tbl"</span>
<span class='op'>)</span>
<span class='va'>data_xlsx</span>
</code></pre></div>
…1 currency date open high low close year
</div>
We can remove the first column by using the `select()` function from the `{dplyr}` package:
<div class="layout-chunk" data-layout="l-body">
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class='va'>data_xlsx</span> <span class='op'><-</span> <span class='fu'>dplyr</span><span class='fu'>::</span><span class='fu'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='op'>(</span><span class='va'>data_xlsx</span>, <span class='op'>-</span><span class='fl'>1</span><span class='op'>)</span>
<span class='co'>#data_xlsx <- dplyr::select(data_xlsx, currency:yday)</span>
<span class='va'>data_xlsx</span>
</code></pre></div>
currency date open high low close year month
</div>
---
### Aesthetics: aes()
Some prefer to place the `aes()` outside the `ggplot()` call:
<div class="layout-chunk" data-layout="l-body">
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class='fu'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='op'>(</span><span class='va'>data</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>date</span>, y <span class='op'>=</span> <span class='va'>close</span><span class='op'>)</span>
</code></pre></div>
<img src="01-grammar_files/figure-html5/structure-aes-outside-1.png" width="960" />
</div>
### Coordinate Systems: coord_*()
The coordinate system maps the two position to a 2d position on the plot:
<div class="layout-chunk" data-layout="l-body">
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class='fu'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='op'>(</span><span class='va'>data</span>, <span class='fu'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>date</span>, y <span class='op'>=</span> <span class='va'>close</span>,
color <span class='op'>=</span> <span class='va'>currency</span><span class='op'>)</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_date.html'>scale_x_date</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_continuous</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_colour_discrete.html'>scale_color_discrete</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/coord_cartesian.html'>coord_cartesian</a></span><span class='op'>(</span><span class='op'>)</span>
</code></pre></div>
<img src="01-grammar_files/figure-html5/structure-coord-1.png" width="960" />
</div>
<div class="layout-chunk" data-layout="l-body">
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class='fu'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='op'>(</span><span class='va'>data</span>, <span class='fu'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>date</span>, y <span class='op'>=</span> <span class='va'>close</span>,
color <span class='op'>=</span> <span class='va'>currency</span><span class='op'>)</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_date.html'>scale_x_date</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_continuous</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_colour_discrete.html'>scale_color_discrete</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_polar</a></span><span class='op'>(</span><span class='op'>)</span>
</code></pre></div>
<img src="01-grammar_files/figure-html5/structure-coord-polar-1.png" width="960" />
</div>
Changing the limits on the coordinate system allows to zoom in:
<div class="layout-chunk" data-layout="l-body">
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class='fu'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='op'>(</span><span class='va'>data</span>, <span class='fu'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='op'>(</span>x <span class='op'>=</span> <span class='va'>date</span>, y <span class='op'>=</span> <span class='va'>close</span>,
color <span class='op'>=</span> <span class='va'>currency</span><span class='op'>)</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_date.html'>scale_x_date</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_continuous</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/scale_colour_discrete.html'>scale_color_discrete</a></span><span class='op'>(</span><span class='op'>)</span> <span class='op'>+</span>
<span class='fu'><a href='https://ggplot2.tidyverse.org/reference/coord_cartesian.html'>coord_cartesian</a></span><span class='op'>(</span>
xlim <span class='op'>=</span> <span class='fu'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='op'>(</span><span class='fu'><a href='https://rdrr.io/r/base/as.Date.html'>as.Date</a></span><span class='op'>(</span><span class='st'>"2018-11-01"</span><span class='op'>)</span>,
<span class='fu'><a href='https://rdrr.io/r/base/as.Date.html'>as.Date</a></span><span class='op'>(</span><span class='st'>"2019-11-01"</span><span class='op'>)</span><span class='op'>)</span>,
ylim <span class='op'>=</span> <span class='fu'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='op'>(</span><span class='cn'>NA</span>, <span class='fl'>100</span><span class='op'>)</span>
<span class='op'>)</span>
</code></pre></div>
<img src="01-grammar_files/figure-html5/structure-coord-zoom-1.png" width="960" />
</div>
---
## Session Info
<details><summary>Expand for details</summary>
<div class="layout-chunk" data-layout="l-body">
[1] “2021-09-03 17:17:37 CEST”
Local: main C:/Users/Freya Watkins/OneDrive - University of Birmingham/Desktop/R/hands-on-ggplot2-training Remote: main @ origin (https://github.com/z3tt/hands-on-ggplot2-training.git) Head: [8b3a779] 2021-08-31: fix link zip
R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages: [1] stats graphics grDevices utils datasets methods
[7] base
other attached packages: [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[5] readr_1.4.0 tidyr_1.1.3 tibble_3.1.2 ggplot2_3.3.5
[9] tidyverse_1.3.1
loaded via a namespace (and not attached): [1] httr_1.4.2 sass_0.4.0 jsonlite_1.7.2
[4] here_1.0.1 modelr_0.1.8 bslib_0.2.5.1
[7] assertthat_0.2.1 highr_0.9 cellranger_1.1.0 [10] yaml_2.2.1 pillar_1.6.1 backports_1.2.1
[13] glue_1.4.2 digest_0.6.27 rvest_1.0.0
[16] colorspace_2.0-2 htmltools_0.5.1.1 pkgconfig_2.0.3
[19] broom_0.7.8 haven_2.4.1 scales_1.1.1
[22] openxlsx_4.2.4 distill_1.2 rio_0.5.27
[25] downlit_0.2.1 git2r_0.28.0 generics_0.1.0
[28] farver_2.1.0 ellipsis_0.3.2 withr_2.4.2
[31] cli_3.0.0 magrittr_2.0.1 crayon_1.4.1
[34] readxl_1.3.1 evaluate_0.14 fs_1.5.0
[37] fansi_0.5.0 xml2_1.3.2 foreign_0.8-81
[40] textshaping_0.3.5 tools_4.0.5 data.table_1.14.0 [43] hms_1.1.0 lifecycle_1.0.0 munsell_0.5.0
[46] reprex_2.0.0 zip_2.2.0 compiler_4.0.5
[49] jquerylib_0.1.4 systemfonts_1.0.2 rlang_0.4.11
[52] grid_4.0.5 rstudioapi_0.13 labeling_0.4.2
[55] rmarkdown_2.9 gtable_0.3.0 DBI_1.1.1
[58] curl_4.3.2 R6_2.5.0 lubridate_1.7.10 [61] knitr_1.33 utf8_1.2.1 rprojroot_2.0.2
[64] ragg_1.1.3 stringi_1.7.4 Rcpp_1.0.7
[67] vctrs_0.3.8 dbplyr_2.1.1 tidyselect_1.1.1 [70] xfun_0.24
</div>
</details>
```{.r .distill-force-highlighting-css}